Skip to content

Conversation

@CloseChoice
Copy link
Contributor

@CloseChoice CloseChoice commented Oct 28, 2025

supports #7804
Add support for the dicom file format.

This PR follows PR #7815 and PR #7325 closely.
Remarkable differences:
I made sure that we can load all of pydicom's test data, and encountered the force=True parameter that we explicitly support here. This allows to trying to load corrupted dicom files, we explicitly test this!

There is one dataset with all of dicom's test data on huggingface which can be loaded using this branch with the following script:

from datasets import load_dataset
from datasets import Features, ClassLabel
from datasets.features import Dicom

features = Features({
    "dicom": Dicom(force=True),  # necessary to be able to load one corrupted file
    "label": ClassLabel(num_classes=2)
})

ds = load_dataset("TobiasPitters/dicom-sample-dataset",
                  features=features)

error_count = 0

for idx, item in enumerate(ds["test"]):
    dicom = item["dicom"]

    try:
        print(f"Type: {type(dicom)}")
        if hasattr(dicom, 'PatientID'):
            print(f"PatientID: {dicom.PatientID}")
        if hasattr(dicom, 'StudyInstanceUID'):
            print(f"StudyInstanceUID: {dicom.StudyInstanceUID}")
        if hasattr(dicom, 'Modality'):
            print(f"Modality: {dicom.Modality}")
    except Exception as e:
        error_count += 1
        print(e)

print(f"Finished processing with {error_count} errors.")

todo:

  • add docs (will do so soon)

@CloseChoice CloseChoice marked this pull request as ready for review October 28, 2025 11:35
@CloseChoice CloseChoice changed the title Add pydicom support Add DICOM support Oct 28, 2025
@lhoestq
Copy link
Member

lhoestq commented Nov 5, 2025

Awesome ! For the docs should we rename https://huggingface.co/docs/datasets/nifti_dataset to medical_imaging_dataset and have both DICOM and NIfTI together or have separate pages in you opinion ?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@CloseChoice
Copy link
Contributor Author

Awesome ! For the docs should we rename https://huggingface.co/docs/datasets/nifti_dataset to medical_imaging_dataset and have both DICOM and NIfTI together or have separate pages in you opinion ?

Makes sense, is more intuitive for the user and the pages as proposed in this branch have a lot of overlap. I would then structure it in such a way to write some brief things about medical imaging, then introduce the formats (so basically concatenating the two pages together and removing duplicates).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants